Richard Apodaca's Depth First: Walking the Web of Chem Informatics

Started: November 3, 2006
Finished; In progress

Overview: Rich's blog is a practioner's dream. It has a lot of information about the ways in which chemical information can be shared. While I don't get into this side of open chemistry, I do value the little comments that Rich makes along the way.

Notes:

August 16, 2006

"Is it possible that calls for Open Access in scientific publishing are missing something fundamental in the culture of science? Why has the nature of scientific publishing changed so little in almost 300 years of continuous operation? Maybe revolutionary advances in information technology kept the system viable."

August 23, 2006

"Science moves forward only insofar as observations can be validated and put to use by a third party. Chemical informatics is no different from any other field in this respect."

"The ACS has recently spoken out on the necessity of open data sets. As a condition for publication, any data reported in a manuscript must now either appear in Supplementary Material or be “readily available, without infringements or restrictions.” Although this is a positive development, the wait continues for an equivalent statement on the availability of source code."

"Open software systems and open data packages are most useful when they can be readily found by others and used together. In an effort to work on this problem, several individuals, including myself, formed The Blue Obelisk group. Through this group and others like it, like-minded researches can begin to reap the benefits of openness enjoyed by other fields."

September 3. 2006

I liked this post because it gave me an overview of how cheminformatics is different or the same as other disciplines:

"It's Hard for Computers to Deal with Chemical Structures. The chemical structure may be a universally understandable language for humans, but not for computers. Searching for chemical structures requires much more advanced and computationally intensive technologies than searching for text. These factors place constraints on chemical information systems not present in other fields."

"Chemical Information is Durable. Chemical information has a very long shelf life. A synthetic procedure written 100 years ago can be just as useful as one written this month. The resulting demand for depth of coverage by chemists is unparalleled."

"Chemical Information is Context-Neutral. Chemical information generated in one field (say, organometallic chemistry) is frequently used in a very different field (say, polymer chemistry). If anything, the trend over the last twenty years has been toward more interdisciplinary chemical research. Naturally, chemists demand that breadth of coverage in their information systems be correspondingly high."

UnkownDate 1 (September 2006)
This is an intersting post because it addresses the collaborative nature of cheminformatics. Unfortuntely, the blogger didn't stamp the date.

"Several online chemical information services, including PubChem, NMRShiftDB, and ZINC, have emerged in a relatively short period of time. As these systems go from being toys for hackers to essential components of scientific workflow, their true potential will be unlocked by developing innovative ways to tie these disparate systems together."

"A future in which Chemical Abstracts Service no longer dominates the collection and distribution of chemical information is looking more possible than ever before. If recent history is any guide to this future, we can look to an array of semi-independent, open systems using open standards and operating on a global scale to become the new focal point. In fact, the capability exists today."

September 22, 2006
This post is very interesting in my study.

"Like no other medium, the Internet tests our basic beliefs about the rights of resource owners and resource users. As the Internet increasingly becomes home to scientific publication mechanisms that have no counterpart in the physical world, a larger question looms: what separates fair use of these services from abuse?"

"The availability of open chemical information resources like PubChem and NMRShiftDB is a very recent phenomenon, and desperately overdue. One premise of this blog is that chemical informatics is at the start of a renaissance; the chemical information revolution that started in the 1950's is now set to continue after a long period of stagnation. Large, open data sources, and open software that mines it, will fuel this transformation, just as they have in bioinformatics."

"Getting back to accessing PubChem data, one very far-sighted thing the NIH has done is to make the entire dataset freely downloadable in three different file formats. Rather than mine the PubChem website itself, you could download the data to your machine, letting the software you write access it locally. The sheer size of this dataset creates problems of its own. Future articles will describe some approaches to solving them."

"Regardless of your views on the use and abuse of chemical information resources like PubChem, it's clear that getting open resources on the Web is only the first in a long series of controversial steps that will ultimately transform both the practice and culture of research."

September 23, 2006
This post addresses the tensions between commerciak and open chemistry.

"The move toward open, Web-based chemical information resources is fully underway. The genie has been let out of the bottle, and there's no putting him back. This is bad news for large, established chemical information players. Their business models based on restricting information flow will be irreversibly disrupted. It's good news for tens of thousands of researchers who will be able to exploit chemical information in ways unimaginable today. Leading the way will be mashups that creatively tie diverse Web resources together, and dynamic programming languages like Ruby that make doing so easy."

September 27, 2006
This post is interesting about the nature of open source.

"Open Source licensing is nothing short of revolutionary. Of all of the things an Open Source license makes possible, perhaps the most far-reaching is the right of licensees to create and distribute derivative works. This is what separates "software that's free" ("free as in beer") from "Free Software" ("free as in speech"). A licensee that is not free to create and distribute derivative works has virtually no incentive to build on what the original creator has given away. Would you contribute your valuable time to improving something that you knew you could never use as you saw fit? This may sound like semantic hair-splitting, but it's far from it. None of the phenomenal progress made in Open Source software would have been possible without the basic rights to create and distribute derivative works."

"PubChem's Copyright Disclaimer should give anyone familiar with Open Source licensing grounds to ponder. Apparently, NIH is telling its users that it doesn't have the authority to grant them the right to copy all PubChem content or distribute derivative works. But what parts of PubChem can these rights be granted for, if any? What parts of Pubchem are copyrighted, and therefore owned, by contributors? How can a user find out which parts of PubChem are subject to copyright claims by contributors?"

"It isn't too difficult to imagine a scenario in which PubChem requires those depositing data to agree to a copyright waiver. This waiver would simply grant PubChem users the sublicensable right to copy a depositor's content verbatim and to distribute derivative works based on it, royalty-free. The depositor would still retain any copyright they might want to assert outside of PubChem. If the depositor doesn't own these rights, or isn't willing to part with them, then that content would be rejected. This has been done for years in Open Source software projects and is being done increasingly with Creative Commons licenses for non-software intellectual property. Both approaches have strengths and weaknesses, and my aim is not to advocate either one. The point is simply that the idea is not new."

"Maybe a copyright waiver isn't feasible. Regardless, PubChem could create a mechanism whereby content for which a contributor is asserting copyright claims can be identified as such and optionally avoided by its users."

September 29, 2006
This post addresses the nature of open source.

"The PubChem FTP-server is a treasure trove of useful data that's available free of charge. Using simple tools like those discussed here, it's possible to generate a virtually infinite variety of customized views of this valuable resource. Many creative, and novel, applications are possible by combining the capabilities shown here with those of Open Source chemical informatics software, such as RCDK, and other Open data sources, such as NMRShiftDB."

October 19, 2006
This is a clever article that hits the topic head on:

"Did I mention that OJS is free software - as in speech? The developers of OJS have licensed their work under the GPL, giving publishers the ability to control every aspect of how their journal management system operates. Standing out from the crowd will no doubt be an essential component of staying competitive in a world in which almost anyone can start their own journal."

"Open Source tools like Open Journal Systems have the potential to radically change the rules of the scientific publication game. By slashing the costs of both success and failure in scientific publication to almost zero, these systems are set to unleash an unprecedented wave of disruptive innovation - and not a moment too soon. What are the true costs of producing a quality Open Access scientific publication - and who pays? Will the idea of starting your own Open Access journal to address deficiencies with existing offerings catch on, especially in chemistry, chemical informatics, and computational chemistry? Before long, we will have answers to these questions."

October 18, 2006
Another useful post about the struggle/comeptitive nature:

"The Directory of Open Access Journals (DOAJ) currently lists 2420 Open Access scholarly journals. Of these, 52 currently fall under the category of chemistry. Although the organic chemistry subcategory only currently lists three journals, the general chemistry category actually contains several journals containing organic chemistry content, such as the Bulletin of the Korean Chemical Society, Chemical and Pharmaceutical Bulletin, and Molbank."

"Clearly, the chemistry journals included in DOAJ's listings would not be considered to be in "the mainstream" by experts in the field. And that's exactly the point. Innovation always happens at the margins."

"It seems very unlikely that scientific publishing operates according to a different set of rules than any other technology-driven business. The coming wave of disruptive innovation will be dramatic, and the outcome completely predictable."